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SUBSTITUTE SPECIFICATION (MARKED-UP VERSION) 
TITLE OF THE INVENTION 

FFT OPERATING APPARATUS OF PROGRAMMABLE PROCESSORS AND OPERATION 
METHOD THEREOF 

CROSS-REFERENCE TO RELATED APPLICATIONS 

[0001] This application claims the benefit of Korean Patent Application No. 2002-78393 filed 
December 10 r 2003, in the Korean Intellectual Property Office, the disclosure of which is 
incorporated herein by reference. 

BACKGROUND 

1 . Field of the invention 

[0002] The present invention relates to a fast Fourier transform (FFT) operating apparatus and 
an operation method thereof. More particularly, in a programmable processor which can b e 
used ift with a variety of standards and e nab le enabling processing of high speed 
telecommunication algorithms in a real-time basis and also guarante e guaranteeing flexibility in 
system design, the present invention relates to a an FFT operating apparatus and a method 
thereof for carrying out FFT operation which is the kernel function of DMT (Discrete MultiTone) 
and OFDM (Orthogonal Frequency Division Multiplexing) modems. 

2. Description of the Related Art 

[0003] Generally, fast Fourier transform (FFT) are used in a variety of fields of communication 
systems such as the with an asymmetric digital subscriber line (ADSL), the wireless 
asynchronous transfer mode (ATM), the a short distance wireless communication network, and 
the applications such as a matched filter, a spectrum analysis, and a radar. The FFT is 
esp e c i al l y required for the establishment of the OFDM, i.e., the next-generation high speed 
telecommunication algorithm. The FFT is the algorithm that transforms a signal in a time 
domain into a signal in a frequency domain. B e caus e Since the FFT can r e duc e significantly 
reduces the number of operations required for the a Discrete Fourier Transform (DFT) 
s ign i fic a nt l y by using the periodicity of trigonometric function functions , operations can b e are 
carried out with increased efficiency. The DFT can b e is expressed by the following formula 1 : 



1 



Docket No.: 1349.1363 



[Formula 1] 

X(k) = 

k = 0X...,N-\ 

[0004] By re-arranging x(n) ef4he in formula 1 into odd-numbered and even-numbered 
samples, respectively, N-point DFT beifl§ is divided into two N/2 DTFs N/2-point DFTs can b e 
and expressed as the following formula 2: 

[Formula 2] 

= g x(n)WZ+ |>)< 

/i=0,ewi n-O t od(t 
yV/2-1 AT/2-1 

= 2>(2/)< + 2^(2/ + lH 2,+,) * 

[0005] As the formula 2 is repeated, the N-point DFT is divided into several 2-point DFTs, and 
this process is referred to as the radix-2 DIT (Decimation-in-Time) FFT. 

[0006] Among the methods used to split the DFT of formula 1 , radix-2 and radix-4 DIT FFTs are 
the methods most frequently used for th e i mpl e m e ntation . 

[0007] The radix-2 DIT FFT is split into odd-numbered and even-numbered samples as in the 
formula 2, while the radix-4 DIT FFT is split into four sets. B e tw ee n In a comparison of these 
two FFTs, the radix-2 DIT FFT has a simpler butterfly structure, and thus requires l ess numb e r 
of fewer multipliers and spac e less area . However, the number of stages increases in the radix- 
2 DIT FFT, and thus it consum e s much uses many more operation cycles compar e d to than the 
radix-4 DIT FFT The radix-4 DIT FFT can enable also permits high speed processing, teor but it 
has a complicated butterfly structure and increases the number of multipliers. Also, op e ration s 
calculations for butterfly input data and addresses are complicated , wh i ch ar e quit e hard and 
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difficult to implement. Additionally, as the FFT having 4 n length is performed, the radix-4 DIT 
FFT has to be used in combination with the radix-2 DIT FFT for the FFT having a 2 n length. 

[0008] Further, the FFT is divided into DIT (Decimation-In-Time) FFT and DIF (Decimation-ln- 
Frequency) FFT according to whether the d i v i ding division operation is based on a time domain 
or a frequency domain. Th e formula Formula 2, which is divided with respect to the time 
domain, is categorized into th e as a DIT FFT If the d i v i ding division operation is performed with 
respect to X(k) in the frequency domain, it c a n b e cat e gor i z e d i nto the FFT is a DIF FFT. 

[0009] In the a digital signal processor, it-te the DIT FFT is usually used fof as the FFT. While 
the DPP DIF FFT adopts th e configur a t i on of p e rform i ng performs addition/subtraction and then 
multiplication, the DIT FFT, as shown in FIG. 1, a dopt s th e conf i gur a tion of p e rform i ng 
multip l ication performs multiplication and then addition/subtraction. Accordingly, for the a digital 
signal processor based on a multiplier-accumulator, the DIT FFT is more suitable for operations. 

[0010] For example, the a DSP 56600 core is a fixed-point digital signal processor which that 
consists of one 16x16 multiplier-accumulator (MAC) and one 40-bit ALU (arithmetic and logic 
unit), and carries out a radix-2 complex FFT butterfly operation using two parallel shift move 
instructions. Since the DSP 56600 core has the configuration of a single multiplier-accumulator, 
It the DSP 56600 core has awide small area, howev e r, w i th but less op e ration operating 
efficiency compa re d w i th the conf i guration of than a dual multiplier-accumulator. The DSP 
56600 core reguires I t tak e s 8N+9 cycles in ord e r for th e DSP 56600 cor e to perform N radix-2 
complex FFT butterfly operations. 

[0011] FIG. 2 shows another example of an operator using the DIT FFT, es p e c i a l ly show i ng i.e., 
a Carmel™ DSP core by Infineon Technologies AG. The Carmel™ DSP core is a 16 bit fixed- 
point decimation core, which includes two multiplexers 11, 1V to select values for a data 
memory, two latch registers 12, 12' to store selected outputs from the multiplexers 11,11', and 
data bus switches 13, 13* to switch data such as resu l t of from data op e r a t i on operations at and 
data from the a data memory so as to input te a corresponding operator in accordance with a 
desired operationr4w o. The Carmel™ DSP core also includes registers 14, 14' storing data for 
input to the a next-stage multiplier-accumulator, a first arithmetic unit 15 having a 16x16 MAC, a 
40-bit ALU, a«d an exponenter and a shifter for a blockfixed floating point operation, a second 
arithmetic unit 16 having a 16x16 MAC and a 40-bit ALU, and an accumulator bank 17 to 
accumulate and store results op e rat e d obtained in the first and second arithmetic unit 15, 16 
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and switched by the data bus switches 13, 13\ The Carmel™ DSP core, which adopts a CLIW 
(Configurable Long Instruction Word) architecture, can carry carries out up to 6 operations 
including 2 parallel data shifts move in a single cycle. Also, as since the Carmel™ DSP core 
supports an automatic scaling mode, an overflow generated in the FFT operations can b e is 
handled without having to use an additional cycle. However, the Carmel™ DSP core has a 
complex hardware configuration since the Carmel™ DSP core is designed m-tfre with CLIW 
architecture to allow the parallel processing of the operations. The Carmel™ DSP core requires 
2N+2 cycles to perform To carry out N radix-2 complex FFT butterfly operations by U 6 ing th e 
Carm e l™ DSP cor e , 2N+2 cycl es ar e requ i r e d . 

[0012] FIG. 3 shows yet another example of an operator using the DIT FFT, ospocial l y showing 
i.e., a Starcore™ SC140 operator. The SC140 A applying a VLIW (Very Long Instruction Word) 
architecture,, includes two data memory buses 21, 21* to send/receive data to and from the data 
memory!-S . The SC140 also includes eight shifter/limiters 22 to shift or limit the operated data 
stored in the data register and load the data to the data memory buses 21, 21 \ the data register 
to store an input and an output of operation units, and four 40-bit ALUs 24, 25, 26, 27. As Since 
each of the ALUs 24, 25, 26, 27 has a MAC, it is possible to carry out up to four MAC operations 
or ALU operations in a single cycle. As a result, using the four MACs, the FFT operations are 
carried out i n a l e ss operation cyc le with fewer operation cycles than the digital signal processor 
having that has a single or dual MAC. 

[0013] However, the Starcore™ SC140 has a large size and consumes lets a lot of power due 
to the integration of lets many of the operation components. Further, it is difficult to efficiently 
allot the operation components due to the data dependency* and it is difficult to read or write tbe 
required data m from/into the memory during a single cycle due to the a lack of the a data bus. 
As a result, the bottlen e ck may occur so that th e performance of the dual four MAC structure 
does can not reach to twice as much as that of the dual MAC structure . 

[0014] In performing cas e of carry i ng out th e N complex FFT butterfly operat i on operations 
using the SC140, 1.5N cycles are required. The above digital signal processors focus on 
increasing the number of the operators to accelerate the FFT butterfly operation or adjusting the 
data path to fit fef the butterfly operation flow. However, th e r e is a li m i tation to r e duc e the 
reduction of the number of operation eyele cycles of the butterfly is limited with respect to the 
limited number of the operators. 



4 



Docket No.: 1349.1363 



[0015] Assuming that two cycles are required for the butterfly operation, (Af/2)log 2 N 
butterflies are needed for the N-point FFT. Thus, if other influences are not considered, 
(2N/2)log 2 N cycles are needed for the N-point FFT. In fact, during the FFT operation, 
operation cycles may be additionally generated for data sW# movement or data address 
ca l cu l at i on calculations , 

[0016] Table 1 shows the a comparison in the number of the butterfly operation cyGte cycles 
and the N-point FFT operation cycles of the Carmel DSP core and the TMS320C62x. As shown 
in Table 1, except for the butterfly operation cycle, additional cycles are required. In cas e of For 
the Carmel DSP core, (2N/2)log 2 N cycles are needed for the butterfly operation, and i n cas e 
of for the TMS320C62x, (4N/2)log 2 N cycles are needed. 



[Table 1] 





Number of butterfly 
operation evete cvcles 


Number of N-point FFT operation cvcles 


Carmel DSP 


2 


(2N/2) log 2 N + 5N/A + 1 01og 2 N + 4 


TMS320C62X 


4 


(AN 1 2) log 2 N + 7 log 2 N + N / 4 + 9 



[0017] FIG. 4 shows an operation of a general 8-point radix-2 DIT FFT. In case of the N point 
FFT operation, there are log 2 N stages and N-l groups. Accordingly, there are 3 three 
stages and 1 seven groups shown in FIG 4, and as the number of the stages increases, the 
number of the butterflies in the group increases or decreases. 

[0018] The FFT operation is carried out in one stage and then repeated in the next stage. 
Within a stage, the operation is carried out by the group. As for In using C or assembly cod e6 
language to implement the FFT, as shown in FIG 5, 3 three looping instructions are used for the 
operations of the stages, the groups, and the butterflies in each group, which may vary 
according to the architectures of a programmable processor and the program. Generally, 3 
three or [[4]] four cycles are required to carry out the looping instruction in the digital signal 
processor. Assuming that L cycles are required for a single butterfly operation and M cycles are 
required to carry out the looping instruction, the number of the cycles to carry out the N point 
FFT operation can b e is obtained through the following formula 3. 
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[Formula 3] 

[[(LxN/2) log 2 N + Mx{N-\) + M log 2 N + a]] 
(LxN/2)\og 2 N + Mx(N-\) + M\og 2 N = a 



[0019] In formula 3, the value of the expression (LxNI 2) log ? N % which is determined by L f 
may be changed according to the number of the MACs and the ALUs of in the digital signal 
processor, and the value of the expression M x ( N - Y) + M log, N, which is determined by M p 
may be changed according to the configuration of a program controller of in the digital signal 
processor. 

[0020] In the butterfly operation to for a group of a stage, the address of input data incr e as e s is 
increased by 4 one . M ea nwhi le , wh e n When the group is altered, the address of input data of a 
butterfly varies according to the size of the group. For th is In formula 3 , a is used to denote the 
number of the required cycles and the cycles required to for the data shift move . If the parallel 
processing is feasible as in the VLIW processor, the number of the additional operation cycles, 
except for the butterfly, m a y b e is reduced to some degree by parallel-processing diverse 
instructions through the assembly de cod in g coding . However, the effeet -reductions due to of 
the parallel processing is are not sufficient. Referring to FIG. 4, the address modification 
according to the alteration of the group is described bvwavo f . For example. [[.]] "a" in the first 
butterfly (© in FIG. 4, group 1 ) of the stage 2 is a memory address 0 and "b" is a memory 
address 2. In FIG. 4, "a" in the second butterfly of the stage 2 (© in FIG. 4, group 1) is a 
memory address 1 , and "b" is a memory address 3. In FIG. 4, "a" in the third butterfly of the 
stage 2 (® group 2 in FIG. 4, group 2) is a memory address 4, and "b" is a memory address 6. 
The address of the input data "a" in the group 1 increases from 0 by a value of 1 . M e anwh ile , 
as As the operation i s a l t e r e d progresses from the group 1 to the group 2, the address of "a" 
changes from 1 to 4. That is, as the group is a l tered changed , the address increment of the 
input data also changes. 

[0021] As aforementioned, to reduce the number of the operation cycles of the N point FFT in 
the programmable processor such as the digital signal processor, it is required to minimize the 
additional operation cycles except for the butterfly operation cycles. However, since the 
conventional digital processors do not support the a hardware structure to reduce the additional 
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operation cycles oxoept for th e cycl e s roqu i rod for the butt e rf l y oporat i o ns, it is difficult to reduce 
the number of the operation cycles. 

SUMMARY OF THE INVENTION 

[0022] An aspect of the present invention is to provide a fast Fourier transform (FFT) operating 
apparatus and an operation method thereof to reduce operations cycles that are additionally 
generated in a programmable processor except for a butterfly operation. 

[0023] To achieve the above aspect of the present invention, a radix-2 complex FFT operation 
method to carry out a FFT operation in the programmable processor includes generating a start 
signal and applying a FFT operation signal if the FFT starts, generating an offset address of a 
butterfly input/output data to read a data and write an operated result in a data memory, and 
storing the generated offset address of the butterfly input/output data in an offset register of a 
programmable processo r, s witching . The method further includes switching a data to provide 
the butterfly input data from the data memory and write the output data in the data memory, 
carrying out a butterfly operation using two multiplier-accumulators, Le^ an arithmetic and logic 
unit, and an exponenter, and generating a stop signal and resetting the FFT operation signal 
when the operation is ended. At th i s tim e, For example, using operation instructions 
SBUTTERFLY (subtract butterfly) and ABUTTERFLY (add butterfly), the FFT operation 
apparatus carries out the FFT operation. 

[0024] According to an aspect of the present invention, even in the conv e ntional a 
programmable processor in which performance is not enhanced through the acceleration of the 
butterfly operation, performance can bo is enhanced by minimizing operation cycles generated 
during a looping instruction, data shift move , and address calculation of butterfly input data 
except for the butterfly operations. 

[0025] Additional aspects and/or advantages of the invention will be set forth in part in the 
description which follows and, in part, will be obvious from the description, or may be learned by 
practice of the invention. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0026] The above aspects and other features of the present invention will become more 
apparent by describing in detail a preferred embodiment thereof with reference to the attached 
drawings, in which: 

FIG. 1 is a view showing a structure of a DIT FFT butterfly; 
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FIG. 2 is a view showing a configuration of the conventional Carmel DSP core operator 
by Infineon Technologies AG; 

FIG. 3 is a view showing a configuration of the conventional SC140 operator by 
Starcore™; 

FIG. 4 is a flow graph showing an operation of a conventional 8-point radix-2 DIT FFT; 
FIG. 5 is a view showing a programming architecture of a an FFT using a looping 
instruction; 

FIG. 6 is a view showing a configuration of a programmable processor for FFT according 
to an aspect of the present invention; 

FIG. 7 is a flow graph showing an operation of a butterfly according to an aspect of the 
present invention; 

FIG. 8 is a flow chart showing the generation of an offset address of DIT butterfly data; 
FIG. 9 is a view showing a configuration of an operator carrying out the operation of FIG. 

8; 

FIG. 10 is a view showing a configuration of a data processor carrying out the DIT 
butterfly operation according to an aspect of the present invention; 

FIG. 1 1 A is a view showing a configuration of a dual multiplier-accumulator having 
separate 2 multiplier-accumulators; 

FIG. 1 1B is a view showing a configuration of a dual multiplier-accumulator using a 3- 
input adder; 

FIG. 1 1C is a view showing a dual multiplier-accumulator carrying out functions of FIGS. 
11A and 11B using a multiplexer; and 

FIG. 12 is a view showing a configuration of a data bus switch of the data processor. 
processing unit. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

[0027] Reference will now be made in detail to the embodiments of the present invention, 
examples of which are illustrated in the accompanying drawings, wherein like reference 
numerals refer to the like elements throughout. The embodiments are described below to 
explain the present invention by referring to the figures . 

[0028] H e r ei naft e r, th e pr e s e nt inv e ntion w i ll b e d e scrib e d i n d e tai l w i th ref e r e nce to the 
a ccomp a nying drawings. 
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[0029] FIG. 6 shows a fast Fourier transform (FFT) operating apparatus to fact operat e 
speedily perform an a N point radix-2 DIT FFT operation without generating additional cycles 
except for butterfly operations. Referring to FIG. 6, the FFT operating apparatus includes a 
program controller 110, a program memory 120, a an FFT address generator 130, an address 
generator 140, a data processor 150, a data memory 160, and a flag register 170. 

[0030] The program controller 110 generates a an FFT start signal and controls a 
programmable processor. The program memory 120 stores an application of the programmable 
processor. The FFT address generator 130 generates an offset address of a an FFT butterfly 
input data and an operation stop signal. The address generator 140 uses the offset address 
generated in the FFT address generator 130 to calculate an address of the data memory 160. 
The data memory 160 stores data, and the data processor 150 uses the data stored in the data 
memory 160 to carry out an arithmetic and logic operation. The flag register 170 generates a an 
FFT operation signal. 

[0031] The data processor 150 includes a data bus switch circuit to receive the butterfly input 
data from the data memory 160 and to write a« output data in the data memory 160, and a 
butterfly operation circuit having two multiplier-accumulators to multiply and accumulate the data 
and one arithmetic and logic unitr-aft . The data processor 1 50 also includes an exponential 
operation circuit to carry out an exponential operation of the data during the butterfly operation, 
an input register to store data memory values, and an accumulator to store operation results 
and reuse the stored data for the operation. 

[0032] FIG. 7 is a flow graph of the butterfly operation according to an aspect of the present 
invention, which shows the a butterfly of F I G. 1 as a complex operation. The complex operation 
is represented as the following formula 4[[.]] where "a" and "b" denote the butterfly input data, 
"c" and "d" denote the butterfly output data, and "w" denotes a twiddle factor. Subscripts V and 
V respectively denote a real part and an imaginary part of each data. 

[Formula 4] 

c } - a. + w r 6, + w f b r 

9 



Docket No.: 1349.1363 



[0033] To operate a single complex butterfly, @ six pieces of input data are required and [[4]] 
four pieces of output data are generated. A s th e The operation is carried out with divided into 2 
two cycles, it-is and implemented using a data memory configuration capable of reading [[3]] 
three pieces of input data and writing 2 two pieces of output data in a single cycle. In a first 
cycle, two of the [[4]]] four pieces of input data are multiplied and subtracted. [At this time, the] 
This operation is carried out according to an operational instruction , for example. SBUTTERFLY. 
In a second cycle, two of the [[4]] four pieces of input data are multiplied and added. Also, tho 
This operation is carried out according to an operational instruction , for example, ABUTTERFLY. 

[0034] The program controller 110 controls a program of a conv e ntion a l programmable 
processor. Also, the program controller 110 decodes a an FFT instruction, transmits an N value 
from the N point FFT to the FFT address generator 130, and generates the FFT operation start 
signal. The FFT address generator 130 receives the N value and the operation start signal from 
the program controller 110 to generate the offset address of the data. 

[0035] FIG. 8 shows a method to generate the offset address of the data in the FFT address 
generator 130, which includes starting the FFT if upon the FFT start signal is having a value of 1 
rrT:1 1. and initializing a group count, a loop count, and a group count max value aH to a value of 
1 [[T, respectively, a]L^ group offset value is set to [['-1']] a value of -1 . a loop count max value 
to [['N/2']] a value of N/2 , and an offset address value of the twiddle factor to ['0'] 0 when the 
FFT startsffil l. The method further includes calculating an address of a* input data A by adding 
the group offset and the loop count value, and an address of a* input data B by adding the 
group offset, the loop count, and the loop count max valuenf. If the loop count value is not 
equal to the loop count max value, i ncr ea s i ng the loop count value is increased by 1 and the 
method resumed r e sum i ng from the calculating the addresses of the input data A, Bf-if, If the 
loop count value is equal to the loop count max value, in i tia l iz i ng the loop count value is 
initialized to [[T]] 1, setting the group offset value with a value obtained by multiplying the loop 
count max value by 2 and adding the group offset value, and increasing the twiddle factor by 1[; 
ifl. If the group count is not equal to the group count max value, i ncr e as i ng the group count is 
increased by 1 and the method resumed r es um i ng from calculating the addresses of the input 
data A, Bh* .Jf the group count value is equal to the group count max value, the method 
initializes i n i t iali zing the group count value to [[V]] 1, the group offset value to [['-1']] J. , and the 
twiddle factor to [['0']] a value of 0 , dividing the loop count max value by [[2]] two, and multiplying 
the group count max value by [[2;]] two, [if] If the group count max value is greater than N/2, the 
method generates an generating the operation stop signal and ending ends the FFT operation* 
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and, if . If the group count max value is not greater than N/2, the method resumes r e sum i ng 
from calculating the addresses of the input data A, B. 

[0036] I n ord e r to To calculate the loops of the [[3]] three stages having a butterfly operation 
loop, a group operation loop, and a stage operation loop, a comparison is carried out three 
times. The A loop count max value and the a group count max value respectively represent the 
number of the butterflies and the number of the groups that are included in each of the groups 
and the stages. If the loop count value and the group count value respectively reach its their 
max value, the operation c a rri e d out continues to a next group and or stage. The group offset 
represents the address modification value when the group is altered. 

[0037] FIG. 9 shows the configuration of the FIT address generator 130 to carry out the 
operations shown in FIG. 8. Referring to FIG. 9, the FTT address generator 130 includes a 
logical sum logic 131, an adder 132, GR, WR, LCR, and GCR registers 133, a group counter 
134, a loop counter 135, a glue logic 136, a first adder 137, a second adder 137', a first 
comparator 138, a second comparator 138', and a third comparator 138". The logical sum logic 
131 generates an initialization signal of a register to store the loop count value and a register to 
store the group count value according to the start signal and a group count match signal. The 
adder 132 updates the group offset by a value obtained by multiplying the group offset and the 
loop count max value by 2 and adding the multiplied value. The GR, WR, LCR, GCR registers 

133 store the group offset, the twiddle factor, the loop count max value, and the group count 
max value. The group counter 134 calculates the group count value, and the loop counter 135 
calculates the loop count value. The glue logic 136 consists of comprises a logic which 
generates a signal to initialize the group counter and the loop counter. The first adder 137 
outputs the address of the input data A by adding the group offset and the loop counter value. 
The second adder 137' outputs the address of the input data B by adding the output from the 
first adder 137 and the loop count max value. The first comparator 138 compares the loop 
count value and the loop count max value, the second comparator 138' compares the group 
counter value and the group count max value, and the third comparator 138" is input with the N 
value and the group count max value and compares the group count max value and the N/2 
value. 

[0038] If the FTT operation start signal is applied, the loop counter 135 and the group counter 

134 are initialized to [['1']] a value of 1 . and GR, WR, LCR, GCR registers 133 are initialized to 
[['-1', '0', 'N/2', and T]] values of -1. 0, N/2, and 1 , respectively. If values of the loop counter 
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135 and the LCR register 133 are identical, [T] a value of one is applied to the loop count 
match signal. If values of the group counter 134 and the GCR register 1 33 are identical, [[T]] a 
value of 1 is applied to the group count match signal. The group counter 134 carries out the 
counting only if the loop count match signal l&^A 1 has a value of 1 . The loop counter 135 and 
the group counter 134 are re-initialized when the loop count match signal and the group count 
match signal b e com e '1' have a value of 1 . respectively. The GR register 133 has a load input 
terminal to update a GR register value and another load input terminal to initialize the value . 
The WR register 133 increases a WR register value by 1 if the loop count match signal is [[T]] 
1, and is initialized to [['0']] to a value of 0, if the group count match signal is [['V]] 1 . The WR 
register 133 outputs a bit-reversed value. The LCR register 133 carries out a 1-bit right shift if 
the group count match signal b e com e s '1' equals a value of 1 . An initial value of the LCR 
register 133 is N/2. The GCR register 133 carries out a 1-bit left shift every time the group count 
match signal is applied. If the GCR register value becomes N, the FFT operation stop signal is 
generated. 

[0039] The offset address generated in the FFT address generator 130 is input to an offset 
register of the programmable processor and used as an offset for a base address. [[A]] An 
aspect of the current invention includes a programmable processor which i s being curr e nt l y 
deve l op e d that uses plural arithmetic and logic units to calculate the address. Hence, a three 
final data addresses can b e are calculated by using the offset addresses generated in the FTT 
address generator 130. 

[0040] FIG. 10 shows the a configuration of the data processor 1 50 te that efficiently carry out 
performs the FFT Referring to FIG. 10, the data proc e s s or processing unit 150 includes two 
multiplier-accumulators and an arithmetic and logic unit to carry out the butterfly operation, a 
data bus switch circuit to control data according to the operation flow, [[8]] eight input registers, 
and three accumulators. By using four multiplexers, the multiplier-accumulator according to an 
aspect of the present invention may function functions as two separate multiplier-accumulators 
or saw carries out a function of adding and accumulating two multiplied results. 

[0041] FIG. 11 A shows a configuration of a conventional dual multiplier-accumulator having two 
separate multiplier-accumulators to output two accumulated results. FIG. 11 B shows a 
configuration capable of accumulating a sum of two multiplied results by using a 3-input adder. 
FIG. 11 C shows a dual multiplier-accumulator capable of carrying out the above conventiona l 
funct i ons operations by using the multiplexer according to an aspect of the present invention. If 
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a selection input of the multiplexer [is '0'] has a value of 0 . the dual multiplier-accumulator 
operates as-te similarly to FIG. 11 A, and if a selection input [is '1'] has a value of 1 , the dual 
multiplier-accumulator operates as-i« similarly to FIG. 11 B. Five input registers store values for 
a r , a tf b r> b|, w fl and Wj, respectively. Three accumulators are required to store 2 two multiplier- 
accumulator values and one arithmetic and logic unit value. 

[0042] FIG. 12 shows the data bus switch of the data processor 1 50. The data bus switch saf* 
be is implemented using six 2x1 multiplexers adapted to a data bus switch of a conventional 
digital signal processor without having to re-design the circuit. 

[0043] As aforementioned, the FFT operation method and a circuit to implement the FFT 
operation method are provided to enhance performance by minimizing the operation cycles 
wh i ch that occur in the looping instruction, the data shift move, and the address calculation of 
the butterfly input data in addition to the butterfly operation , in tho . In the conventional 
programmable processor of wh i ch performance is not enhanced through the acceleration of the 
butterfly operation. Further, according to an aspect of the present invention, the operating 
apparatus of the conv e ntion al digital signal processor can be is re-used by including the FFT 
address generator 130 and the switch circuit of the data to thereby enhance the performance 
and facilitate the design and the modification. 

[0044] Table 2 shows the comparison b e twe e n th e conv e nt i on al programmabl e proc e ssor and 
according to the number of the FFT operation cycles together with the number of the multiplier- 
accumulators between the conventional programmable processor and the present invention . 
Th e conf i guration according to th e pr ese nt inv o nt i on do e6 not g e n e rat e addit i ona l operat i on 
cyc le s e xc e pt for th e butt e rfly op e ration . Compared w i th a convent i onal dig i ta l s i gn al proc e s s or 
hav i ng th e 6a m e number of th e mu l t i pli e r - accumu l ators, th e 256 - point FFT has p e rform a nc e 
onhanc e d 16%~57% . 

[0045] Thoroforo, th e FFT op e r a ting a pp a ratus according to th e pr e s e nt i nv e ntion app lie s l ess 
hardwar e to the conv e nt i onal programmabl e proc e ssor to th e reby r e duc e th e numb e r of th e FFT 
op e r a tion cyc le s, prov i d e d e s i gn flexibi li ty to a FFT proc e ssor which hav e b ee n imp le m e nt e d 
with a conv e nt i ona l on - d e mand s e miconductor ch i p, and a ll ow a r e a l- t i m e proc e ss i ng of a n 
a dvanced tel e commun i cation s y s t e m. 



Uable 2] 
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Digital signal 
processor 


Number 
of 

butterfly 
operation 
cycles 


N=256 


N=1024 


Formula 


number 
Number 

_r MAP 

OT MAO 


DSP1620 


- 


16065 


- 


- 


1 


DSP56602 


8 


9600 


49680 


- 


1 


DSP56303 


- 


9096 


- 


- 


1 


TMS320C54X 


8 


8542 


42098 


- 


1 


TMS320C55X 


5 


4786 


- 


- 


2 


TMS320C62X 


4 


4225 


20815 


(4N/2)log 2 N + 7\og 2 N + N/4 + 9 


2 


TMS320C67X 


- 


4286 


20716 


(2N/2)\og 2 N + 23\og 2 N + 6 


2 


Carmel DSP 


2 


2452 


11624 


(2/N)log 2 N + 5N/4 + \0log 2 N + 4 


2 


core 












Palm DSP 


2 


_ 


_ 


_ 


2 


core 












Frio core 


3 


3176 


- 


- 


2 


StarCore 


1.5 


- 


- 


- 


4 


(SC140) 












Configuration 












an aspect of 


2 


2051 


10243 


(2N/2)log 2 N + 6 


[[6]] 2 


the present 












invention 













[0046] Although a few preferred embodiments of the present invention bas have been shown 
and described, it will be und e rstood appreciated by those skilled in the art that the pr e s e nt 
i nv e nt i on should not bo l im i ted to th e descr i bed pr e f e rr e d changes may be made in these 
embodiments , but var i ous changes and mod i ficat i ons can b e mad e w i th i n the without departing 
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from the principles and spirit and scop e of the pr e s e nt invention as d e f i n e d by the scope of 
which is defined in the app e nd e d claims and their equivalents . 
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